Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove cudf.Scalar from shift/fillna #17922

Merged

Conversation

mroeschke
Copy link
Contributor

Description

Toward #17843

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mroeschke mroeschke added Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 5, 2025
@mroeschke mroeschke self-assigned this Feb 5, 2025
@mroeschke mroeschke requested a review from a team as a code owner February 5, 2025 18:48
@@ -761,13 +760,21 @@ def _check_scatter_key_length(
f"{num_keys}"
)

def _scalar_to_plc_scalar(self, scalar: ScalarLike) -> plc.Scalar:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit odd as a class method. I feel like a free function that accepts a dtype would be more appropriate, then we could call that with col.dtype. Scoping-wise this doesn't feel like a Column method. Plus then it would directly mirror pa_scalar_to_plc_scalar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there is currently a small benefit because we can override this method for decimal columns to get the specialized behavior that we need, but I think that we don't need that any more (see my comment on that class).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I've closes #18035 as an attempt to avoid this decimal special casing, do you still feel strongly about having this as a free function? I chose a class method because, as you mentioned, I am able to customize this for decimal and it's a little more obvious when I could remove this in the future

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think it's fine to leave it as is for now.

isinstance(fill_value, np.datetime64)
and self.time_unit != np.datetime_data(fill_value)[0]
):
# TODO: Disallow this cast
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like a lot of your PRs have had these kinds of comments. Do they all fall into similar buckets? Should we open some issues for tracking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I double checked pandas and our casts here matches the pandas behavior for this method (although I'm not fond of it)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question about tracking TODOs still applies but I'm good with holding off on that. grepping the codebase for these small things is OK with me for now given how much we're churning internally anyway.

@@ -168,16 +170,35 @@ def _binaryop(self, other: ColumnBinaryOperand, op: str):

return result

def _scalar_to_plc_scalar(self, scalar: ScalarLike) -> plc.Scalar:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that #17422 is merged I think we can stop special-casing this and see if anything breaks. WDYT? It does mean that decimal conversions in tests will fail if run with an older version of pyarrow, but I think that's an OK tradeoff. We might have to put some conditional xfails into our test suite for the "oldest" test runs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #18035 to dedicate to avoid the decimal special casing.

Can discuss on that PR, but IIUC, to avoid these conversion on the Python side, we would need pyarrow APIs introduced in pyarrow 19

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Let's discuss there, I responded in #18035 (comment)

@vyasr
Copy link
Contributor

vyasr commented Mar 4, 2025

Approving since we're putting a pin in #18035.

@mroeschke
Copy link
Contributor Author

/merge

1 similar comment
@vyasr
Copy link
Contributor

vyasr commented Mar 4, 2025

/merge

@rapids-bot rapids-bot bot merged commit 45d8066 into rapidsai:branch-25.04 Mar 4, 2025
106 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants